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Abstract — Recent work on the foundations of optimization has 
begun to uncover its underlying rich structure. In particular, the 
“No Free Lunch” (NFL) theorems [WM97] state that any two 
algorithms are equivalent when their performance is averaged 
across all possible problems. This highlights the need for exploit- 
ing problem -specifc knowledge to achieve better than random 
performance. In this paper we present a general framework 
covering most search scenarios. In addition to the optimiza- 
tion scenarios addressed in the NFL results, this framework 
covers multi-armed bandit problems and evolution of multiple 
co-evolving agents. As a particular instance of the latter, it 
covers “self-play” problems. In these problems the agents work 
together to produce a champion, who then engages one or more 
antagonists in a subsequent multi-player game. In contrast to 
the traditional optimization case where the NFL results hold, 
we show that in self-play there are free lunches: in coevolution 
some algorithms have better performance than other algorithms, 
averaged across all possible problems. However in the typical 
coevolutionary scenarios encountered in biology, where there is 
no champion, NFL still holds. 

I. Introduction 

Optimization algorithms have proven to be valuable in 
almost every setting where quantitative £gures of merit are 
available. Recently, the mathematical foundations of opti- 
mization have begun to be uncovered [KWM01], [MW96], 
[FHS01], [IT01], [COOl]. One particular result in this work, 
the “No Free Lunch” (NFL) theorems, establishes the equiva- 
lent performance of all optimization algorithms when averaged 
across all possible problems [KWM01]. Numerous works have 
extended these early results, and considered their application 
to different types of optimization (e.g. to multi-objective op- 
timization [CK03]). The web site www . no- free- lunch, 
org offers a list of recent references. 

However, all previous work has been cast in a limited 
manner that does not cover repeated game scenarios where 
the £gure of merit can vary based on the response of an- 
other player. In particular, the NFL theorems do not cover 
such scenarios. These game-like scenarios are usually called 
“coevolutionary” since they involve the behaviors of more than 
a single agent or player [FP00]. 

One important example of coevolution is “self-play”, where 
the players cooperate to train one of them as a champion. That 
champion is then pitted against an antagonist in a subsequent 
multi-player game. The goal is to train that champion to 
perform as well as possible in that subsequent game. For a 
checkers example see [CF99]. We will refer to all players 
other than the one of direct attention as that player’s “op- 
ponents”, even when (as in self-play) the players are actually 
cooperating. (Sometimes when discussing self-play we will 


refer to the speci£c opponent to be faced by a champion in a 
subsequent game — an opponent not under our control — as 
the champion’s “antagonist”.) 

Coevolution can also be used for problems that on the 
surface appear to have no connection to a game (for an early 
application to sorting networks see [Hil92]). Coevolution in 
these cases enables escape from poor local optima in favor of 
better local optima. 

In this paper we £rst present a mathematical framework 
that covers both traditional optimization and coevolutionary 
scenarios. (It also covers other scenarios like multi-armed 
bandits.) We then use that framework to explore the differences 
between traditional optimization and coevolution. We £nd 
dramatic differences between the traditional optimization and 
coevolutionary scenarios. In particular, unlike the fundamental 
NFL result for traditional optimization, in the self-play domain 
there are algorithms which are superior to other algorithms for 
all problems. However in the typical coevolutionary scenarios 
encountered in biology, where there is no champion, NFL still 
holds. 

II. General Framework 

In this section we present a general framework, and illustrate 
it on two examples. Despite its substantially greater breadth 
of applicability, the formal structure of this framwork is only 
a slight extension of that used in [WM97]. 

A. Formal framework specifcation 

Say we have two spaces, X and Z. To guide the intuition, 
a typical scenario might have x € X be the joint strategy 
followed by our player(s), and z € Z be the probability 
distribution over some space of possible rewards/payoffs to 
the champion, or over possible £gures of merit, or some such. 

In addition to X and Z, we also have a £tness function 

f : X >—> Z. (1) 

In the example where z is a probability distribution over 
rewards, / can be viewed as the specifcation of an x- 
conditioned probability distribution of rewards. 

We have a total of m time-steps, and represent the infor- 
mation generated through those time-steps as 

dm = = (K(t)}£l, KW} t=l)- 

Each d x (t) is a particular x € X. Each d z (t) is a (perhaps 
stochastic) function of f(dx{t')). For example, say z’s — 
values of /(x) — are probability distributions over reward val- 
ues. Then d z (t) could consist of the full distribution f(d x (t)). 




Alternatively, it could consist of a moment of that distribution, 
or even a random sample of it. In general, we allow the 
function specifying d z (t) to vary with t , although that freedom 
will not be exploited here. As shorthand we will write d(t) to 
mean the pair (d x (t) , d z (t)) . 

A search algorithm a is an initial distribution Pi{d x (l)), 
together with a set of m - 1 separate conditional distributions 
P t (d x {t) j d t -i),t = 2, ...m. Such an algorithm specifies 
what x to choose, based on the information uncovered so 
far, for any time-step t. Finally, we have a vector-valued cost 
function, C(d m ,f) which we use to assess the performance 
of the algorithm. Often our goal is to find the a that will 
maximize E(C ), for a particular choice of how to form the 
d z (t)'s. 

The NFL theorems concern averages over all / of quantities 
involving C. For those theorems to hold, for /-averages of 
C to be independent of the search algorithm, it is crucial 
that C does not depend on /. (The framework in [WM97] 
defines cost functions as real- valued functions of a m .) When 
that independence is relaxed, the NFL theorems need not hold. 
Such relaxation occurs in self-play, for example, and is how 
one can have free lunches in self-play. This papers explores 
this phenomenon. 

B. Examples of the framework 

a) Example 1: One example of this framework is the 
scenario considered in the NFL theorems. There each z is a 
probability distribution over a space Y C R. For convenience 
we take X and Y countable. Each df(t) is a sample of the 
associated function z(t) = f(d x (t)). The search algorithm is 
constrained so that 

P t (d x (t) = x | dt_t) = 0 Vx e d?_ x , (3) 

i.e., so that the search never revisits points already sampled. 1 
Finally, C(d m , /) is allowed to be any scalar-valued function 
that depends on d m exclusively. 

The NFL theorems apply to any scenario meeting these 
specifications. 

b) Example 2: Another example is the multi-arm bandit 
problem introduced for optimization by Holland [Hol75] and 
thoroughly analyzed in [MW98]. The scenario for that problem 
is identical to that for the NFL results, except that there are 
no constraints on the search algorithm, Y — R and every z is 
a Gaussian. The fact that revisits are allowed means that NFL 
need not apply. 

c) Example 3: Self-play is identical to the NFL scenario 
except that C depends on /. This dependence is based on a 
function A(d m ) mapping d m to a subset of X. Intuitively, 
A specifies the details of the champion, based on the m 
repeated games and on the possible responses to the champion 
of an antagonist in a subsequent game. C is then based on 
this specification of the champion. Formally, it uses A to 

'This requirement is just to “normalize” algorithms. In general, an algo- 
rithm that sometimes revisits points can outperform one that never does. Our 
requirement simply says that we’re purely focusing on how well the algorithms 
choose new points, not how smart they are about whether to finish the search 
at t = m by sampling a new point or by returning to one already visited. See 
[WM97], 


determine the quality of the search algorithm that generated 
d m as follows: 

C(d m ,f)= min (4) 

*6 A(d m ) 1 

where Eyr x ) is the expected value of the distribution of payoffs 
f{x) obtained from x, £„ e y yP f (y j x) = y[f(x)](y). 

Intuitively, this measure is the worst possible payoff to the 
champion. 

To see in more detail how this describes self-play, assume 
two players, with strategy spaces X\ and X 2 , Xi being the 
strategy space of our champion. Take x to be the joint strategy 
of our players in any particular game, i.e., x € X = Xi x X 2 . 
So df n specifies the m strategies followed by our champion (as 
well as that of the other player) during the m training games. 
d z m is the associated set of rewards to our champion, i.e., each 
d z (t) is a sample of the distribution f(d x (t)). 

Let X\ 6 Xi be the strategy our champion elects to follow 
based on that training data. Note that that strategy can be 
represented as the set of all joint-strategies x whose first 
component is xi- We adopt this representation, and write the 
strategy chosen by our champion — the set of all x’s consistent 
with the champion’s choice of strategy xi — as A(d m ) C X. 

Say the antagonist our champion will now face is able 
to choose the worst possible element of X 2 (as far as ex- 
pected reward to our champion is concerned), given that our 
champion chooses strategy A(d m ). If the antagonist does this 
the expected reward to our champion is given by C(d m ,f ) 
as defined above. Obvious variants of this setup replace the 
worst-case nature of C with some alternative, have A be 
stochastic, etc. Whatever variant we choose, typically our 
goal in self-play is to choose a and/or A so as to maximize 
E(C), the expectation being over all possible d m . The fact 
that C depends on / means that NFL need not apply. The 
mathematical structure that replaces NFL is explored in the 
following sections of this paper. 

d) Example 4: The basic description of self-play in the 
introduction looks like a special case of the more general 
biological coevolution scenario. However in terms of our 
framework they are quite different. 

In the general coevolution scenario there are a total of N 
agents (or players, or species’, or genes, etc). Their strategy 
spaces are written Xu as in self-play. Now though X is 
extended beyond the current joint strategy, to include the 
previous joint “population frequency” value. Formally, we 
write 

X = (Xi.ui) x ••• x (Xjv,ujv), (5) 

and interpret each Ui S R as agent i’s previous population 
frequency. As explained below, the reason for this extension 
of X is so that a can give the sequence of joint population 
frequencies that accompanies the sequence of joint strategies. 

In the general coevolution scenario each z is a probability 
distribution over the possible current population frequencies 
of the agents. So given our definition of X, we interpret / as 
a map taking the previous joint population frequency, together 
with the current joint strategy of the agents, into a probability 
distribution over the possible current joint population frequen- 
cies of the agents. 



3 


As an example, in evolutionary game theory, the joint strat- 
egy of the agents at any given t determines the change in each 
one’s population frequency in that time-step. Accordingly, in 
the replicator dynamics of evolutionary game theory, / takes a 
joint strategy xi x . . . xn and the values of all agents’ previous 
population frequencies, and based on that determines the new 
value of each agent’s population frequency. 

As before, each d z (t) contains the information coming out 
of f{d x {t)). Here that information is the set of current popula- 
tion frequencies. The search algorithm a now plays two roles. 
One of these is to direcdy incorporate those current population 
frequencies into the {u,} components of d x (t + 1). The other 
is, as before, to determine the joint strategy [xi, . . . , x,v] 
for time f 4- 1. As in self-play, this strategy of each agent 
i is given by a (potentially stochastic and/or time-varying) 
function a l . An application of a is given by the simultaneous 
operation of all those N distinct a, on a common dt, as well 
as the transfer of the joint population frequency from d z (t), 
to produce d x (t + 1). 

Note that the choice of joint strategy given by a may depend 
on the previous time-step’s frequencies. As an example, this 
corresponds to sexual reproduction in which mating choices 
are random. 2 However in the simplest version of evolutionary 
game theory, the joint strategy is actually constant in time, 
with all the dynamics occuring via frequency updating in /. 
If the agents are identi£ed with distinct genomes, then in this 
version reproduction is parthenogentic. 

Finally, C is now a vector with N components, each 
component j only depending on the associated d t (j). In 
general in biological coevolution scenarios (e.g., evolutionary 
game theory), there is no notion of a champion being produced 
by the search and subsequently pitted against an antagonist in 
a “bake-off”. Accordingly, there is no particular significance 
to results for C’s that depend on /. 

This means that so long as we make the approximation, 
reasonable in real biological systems, that x’s are never 
revisited, all of the requirements of Ex. 1 are met This means 
that NFL applies. So in particular, say we restrict attention to 
the particular kinds of a’s of evolutionary game theory. Then 
any two choices of a - any two sets of strategy-making rules 
{a t } - perform just as well as one another, averaged over all 
/’ s. More generally, we can consider other kinds of a as well, 
and the result still holds. 

III. Application to Self-play 

In example 3 of section II-B we introduced self-play model. 
In the remainder of this paper we show how free lunches may 
arise in this setting, and quantify the a priori differences be- 
tween certain self-play algorithms. For expository simplicity, 
we modify the definitions introduced in the general framework, 
to tailor them for self-play. 

In self-play agents (or game strategies) are paired against 
each other in a (perhaps stochastically formed) sequence to 
generate a set of 2-player games. After m distinct training 

2 Obvious elaborations of the framework allow x to include relative rewards 
from the preceding round, as well as frequencies. This allows mate selection 
to be based on current differential fitness, as well as overall frequency in the 
population. 


games between an agent and its opponents, the agent enters 
a competition Performance of the agent is measured with a 
payoff function. As shorthand, the (here deterministic) payoff 
function when the ith agent plays move (strategy) x t and z’s 
opponent plays x t is written as f l (x l ,x i ). If we indicate the 
joint move of i and its opponent as x t = (x i5 Xi) we can write 
the payoff to agent i as In the following we make no 

assumption about the structure of moves except that they are 
finite, x might represent a sequence of plays representing an 
entire game of checkers and x might represent a complete 
set of opponent responses to each play. The payoff function 
/(x,x) might then represent the outcome of the game as +1 
for a win for i, 0 for a draw, and —1 for a loss. Illegal joint 
moves can be eliminated by appropriately limiting the space of 
moves and opponent responses in order to satisfy the rules of 
the game. In other applications, x might represent an algorithm 
to sort a list and x a mutable set of lists to be sorted. The 
payoff would then reject the ability of the algorithm to sort 
those lists in x. 

We define the payoff for agent i playing move inde- 
pendent of an opponent’s reply, g (Xj), as the least payoff 
over all possible opponent responses (a minimax criteria): 
<7, (x, ) = miiixi fiix.i-.Xi). With this criterion, the best move 
an agent can make is that move which maximizes gi so that 
its performance in competition (over all possible opponents) 
will be as good as possible. We are not interested in search 
strategies just across z’s possible moves, but more generally 
across all joint moves of i and its opponents. (Note that 
whether that opponent varies or not is irrelevant, since we 
are setting its moves.) The ultimate goal is to maximize t’s 
minimax performance g z . 

We make one important observation. In general, using a 
random pairing strategy in the training phase will not result in 
a training set that can be used to guarantee that any particular 
move in the competition is better than the worst possible move. 
The only way to ensure an outcome guaranteed to be better 
than the worst possible is to exhaustively explore all possible 
responses to move x, and then determine that the worst value 
of fi for all such joint moves is better than the worst value 
for some other move, x'. To do this certainly requires that 
77i is greater than the total number of possible moves by 
the opponent but even for very large m unless all possible 
opponent responses have been explored we can not make any 
such guarantees. 

Pursuing this observation further, consider the situation 
where we know (perhaps through exhaustive sampling of 
opponent responses) that the worst possible payoff for some 
move x is g(x) and that another joint move x' = (x',x') with 
x ^ x' results in a payoff f(x') < g(x). In this case there 
is no need to explore other opponent responses to x' since it 
must be that g(x') < g(x), i.e. x' is minimax inferior to x. 
Thus, considering strategies for searching across the space of 
joint moves Xi, any algorithm that avoids searching regions 
which are known to be minimax inferior (as above) will be 
more efficient than one which searches these regions (e.g. 
random search). This applies for all and so the smarter 
algorithm will have an average performance greater than the 
dumb algorithm. Very roughly speaking, this result avoids NFL 
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implications because uniformly varying over all does not 
uniformly vary over all possible /,, which are the functions 
that ultimately determines performance. 

In the following sections we explore this observation further. 

A. Defnitions 

As much as possible we follow the notation of [WM97] 
extending it where necessary. That paper should be consulted 
as motivation for the analysis framework we employ. Without 
loss of generality we now consider two player games, and 
leave the agent index i implicit. If there are Z moves available 
to an agent, we label these by x £ X_ = [1, • • • , I]. For each 
such move we assume the opponent may choose from one of 
l (x) possible moves forming the space X(x). 3 For simplicity 
we will take X(x) to be independent of x. Consequently, 
the size of the joint move space is |A| — Ylt-i Mx). If the 
training period consists of m distinct joint moves, even with 
m as large as Tfj — Z, we cannot guarantee that the agent 
won’t choose the worst possible move in the competition as 
the worst possible move could be the opponent response that 
was left unexplored for each of the Z possible moves. 

In [WM97] a population is a sample of distinct points from 
the input space X and their corresponding fitness values. 
In this coevolutionary context the notion of a population 
of sampled con£gurations needs to be extended to include 
opponent responses. For simplicity we assume that £tness 
payoffs are a deterministic function of joint moves. Thus, 
rather than the more general output space Z, we assume payoff 
values lie in a £nite totally ordered space. Consequently, the 
£tness function is the mapping / :XhF where X = 1x1 
is the space of joint moves. As in the general framework a 
population of size m is represented as 

where d x m (i) = {<&(»),<*£,(*)} and <*£»(*) = /(<&(*)> <C(*)) 
and i € [1, - • • ,m] labels the samples taken. In the above 
definition df n (i) is the tth move made by the agent, d x n {i) is 
the opponent response, and d v m (i) is the corresponding payoff. 
As usual we assume that no joint configurations are revisited. 
A particular coevolutionary optimization task is specified by 
de£ning the payoff function that is to be extremized. As dis- 
cussed in [WM97] a class of problems is de£ned by specifying 
a probability density P(f) over the space of possible payoff 
functions. As long as both X and Y are £nite (as they are in 
any computer implementation) this is straightforward. 

In addition to this extended notion of a population, there 
is an additional consideration in the coevolutionary setting, 
namely the decision of what move to make in the competition 
based upon the results of the training population. Formally, 
we encapsulate the process of making this decision as A. 3 4 A 
consists of a set of distributions (one for each m since we 
would like A to select a move regardless of the size of the 

3 Note that the space of opponent moves varies with x. This is the typical 
situation in applications to games with complex rules (e.g. checkers). 

4 The notation A is meant to suggest that, unlike the A(dm ) function 

introduced earlier, A de£nes only the champions move, and not the possible 
responses to this move. 


training set) of the form P(x £ X\ d m ). If A deterministically 
returns a single move, we indicate the mapping from training 
population to move as A(d m ). To summarize, the de£nition 
of search method is extended for self-play to include: 

• A search rule a which determines the manner in which 
a population is expanded during training and is formally 
given by the set of distributions {Pt(d x (t) | d t -i)} r t r £i ■ 
This corresponds exactly to the definition of a search 
algorithm in [WM97] used in non-coevolutionary opti- 
mization. 

• A move-choosing rule A mapping probabilistically or 
deterministically to the single move used in the compe- 
tition. We write A explicitly as the probability density 
A(x I d m ) where x £ X. For deterministic A we write 
the density as A(x | d m ) — S(x — A(d m )). 

The tuple (a, A) is called a search process (as opposed to a 
search algorithm in [WM97]). 

Tne search process seeks a strategy that will perform 
well in competition. If A is deterministic the natural mea- 
sure of the performance of search process (a, A) obtained 
during training is C = min i€ jtn ] f(A(d m ), 

A is not deterministic then we use the weighted average 
E £ eA min *€[i,H /(£> d x m {i))A{x | d m ).) The best (a, A) for 
a particular / are those which maximize C. 

The traditional version of NFL (for traditional optimiza- 
tion) defines the performance differently since there is no 
opponent. In the simplest case the performance of a (recall 
that there is no choosing algorithm) might be measured as 
C = max i6 |i iTn ] cij',. One traditional NFL result states that 
the average performance of any pair of algorithms is identical, 
or formally, p {C | f,m,a) is independent of a. 5 A 
natural extension of this previous results considers a non- 
uniform average over fitness functions. In this case the quantity 
of interest is p ( c I f, m , a ) p (f) where P(f) weights 
different fitness functions. NFL results can be proven for other 
non-uniform P(f) [SVW01]. 

A result akin to this one in the self-play setting would state 
that the unform average P (C I /, m, a, A) is independent 
of a and A. However, as we have informally seen, such a 
result cannot hold in general since a search process with an 
a that exhausts an opponent’s repertoire of moves has better 
guarantees than other search processes. A formal proof of this 
statement is presented in the next section. 

IV. An Intuitive Example 

Before proving the existence of free lunches we give a 
motivating example to both illustrate the definitions made in 
the above section and to show why we might expect free 
lunches to exist. Consider the concrete case where the player 
has two possible moves, i.e. X_ — {1, 2},_the opponent has two 
responses for each of these moves, i.e. X = {1, 2}, and there 
are only two possible payoff values, i.e. Y = {1/2, 1}. In this 
simple case there are 16 possible functions and these are listed 
in Table I. We can see that in this simple example the minimax 

5 Actually far more can be said, and the reader is encouraged to consult 
[WM97] for details. 
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TABLE I 

Exhaustive enumeration of all possible functions f(x,x) and g{x ) = min X f(x,x) for X — {1,2}, X = {1,2}, and Y — {1/2, 1}. The 

PAYOFF FUNCTIONS LABELED IN BOLD ARE THOSE CONSISTENT WITH THE POPULATION d 2 = {(1,2; 1/2), (2, 2; 1)}. 


criteria gives a very biased distribution over possible perfor- 
mance measures: 9/16 of the functions have g — [l/2 1/2], 
3/16 have g = [l/2 l], 3/16 have g = [l 1/2], and 1/16 
have g = [l 1] where g = [g(x = 1) g(x = 2)J . 

If we consider a particular population, say d .2 = 

{(1, 2; 1/2), (2, 2; 1)}, the payoff functions that are consistent 
with this population are /<}, /io, / 13 , f\A and the correspond- 
ing distribution over g functions is 1/2 [l/2 1/2] and 

1/2 [l/2 l] . Given the fact that any population will give a 
biased sample over g functions it may not surprising that there 
are free lunches. We might expect that an algorithm which is 
able to exploit this biased sample would perform uniformly 
better than another algorithm which does not exploit the biased 
sample of gs. In the next section we prove the existence of 
free lunches by constructing such a pair of algorithms. 

v. Proof of Free Lunches 

In this section a proof is presented that there are free 
lunches for self-play by constructing a pair of search processes 
one of which explicitly has performance equal to or better 
than the other for all possible payoff functions f. As in 
earlier NFL work we assume that both |X| and |Y| are 
£nite. 6 For convenience, and with no loss in generality, we 
normalize the possible Y values so that they are equal to 

i/iYi,2/in--.,i. 

The pair of processes we construct use the same search 
rule a (it is not important in the present context what a 
is) but different deterministic move choosing rules A. In 
both cases a Bayesian estimate based on uniform P(f) and 
the d m at hand is made of the expected value of g(x) = 
min* / (x, x) for each x. Since we are striving to maximize 
the worst possible payoff from /, the optimal search process 
selects the move which maximizes this expected value while 
the worst process selects the move which minimizes this 
value. More formally, if E(C | dm, a. A) differs for the two 
choices of A, always being higher for one of them, then 
E(C | m, a, A) — P(dm I a)E(C | dm, A) differs for 
the two A. In turn, E(C | m,a,A) = Yljc\P x P& I 
f,m,a,A) x P(f)} oc Ylf,cl C x p ( c I /, m,a,A)\ for the 
uniform prior P(f). Since this differs for the two A, so must 
HfP{C | f,m, a, A). 

Let g(x) be a random variable representing the value of 
g(x) conditioned on d m and x, i.e. it equals the worst possible 


payoff (to the agent) after the agent makes move x and 
the opponent replies. In the example of section IV we have 
Eg(l) = 1/2 and Eg(2) = 3/4 
To determine the expected value of g(x) we need to know 
P{g{A) I x, dm) = I2fP{g(x) | x,dm,f)P(f) for uniform 
P(f). Of the entire population d m only the subset sampled 
at x is relevant. We assume that there are k(x,d m ) < m 
such values. 7 Since we are concerned with the worst possible 
opponent response let r(x, d rn ) be the minimal Y value 
obtained over the k(x,d m ) responses to x, i.e. r(x,d m ) = 
min x6d i d^(x, x). Since payoff values are normalized to lie 
between 0 and 1, 0 < r(x,d m ) < 1- Given k(x,d m ) and 
r(x, dm), P(g | x, d m ) is independent of x and d rn and so we 
indicate the desired probability as TTk, r (g)- 

In appendix A we derive the probability 7r fc _ r in the case 
where all Y values are distinct (we do so because this results 
in a particularly simple expression for the expected value of g) 
and in the case where Y values are not forced to be distinct. 
From these densities we the expected value of g(x) can be 
determined. In the case where Y values are not forced to be 
distinct there is no closed form for the expectation. However, 
in the continuum limit where |F| -*oowe £nd (see appendix 
B) 


E(p(x) | X, dm) = 


1 - (1 - r(x, ri m ))te)-fc(a,*n)+i 
I(x) - k(x, d m ) + 1 


( 6 ) 


where we have explicitly noted that both k and r depend 
both on the move x as well as the training population dm- 
As shorthand we de£ne C m (x) = E(g(x) | x,d m )- 

The best move given the training population is the deter- 
ministic choice A^^dm) = arg max x C m (x) and the worst 
last move is A wov . t (d m ) = arg min x C m (x). In the example 
of section IV with the population of size 2, A best (d 2 ) = 2 and 

— worst(^2) 1* 

As long as C m (x) is not constant (which will usually be the 
case since the r values will differ) (a, and (a, A wor st) 
will differ, and the expected performance of will be 
superior. This proves that the expected performance over all 
payoff functions of algorithm (a, A^) is greater than that of 
algorithm (a, 4^,). 


VI. Other Free Lunches 

We have shown the existence of free lunches for self-play 
by constructing a pair of algorithms with the same search rule 


6 Recall that |X| = J2 X *( 2 )- 


7 Of course, we must also have fc(x, dm) < i(x) for all populations dm. 
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a but different move-choosing rules A and showing different 
performance. Unsurprisingly, we can also construct algorithms 
with different expected performance but having the same 
move-choosing rule and different search mles. In this section 
we provide a simple example of such a pair of algorithms. 
This should help demonstrate that free lunches are a rather 
common occurrence in self-play settings. 

Take the simple case where all l agent moves offer the 
same possible number of opponent responses, say 7. Consider 
a search rule that explores m — l distinct joint samples. Agent 
moves are labeled by x e { 1 , - - - Z} and opponent responses 
are labeled by x £ {1, ■ ■■!}. For simplicity we assume that 
1 = 1 . 

Search rule a\ explores the joint moves (1, 1), • • • , (1, m ) 
while search rule a 2 explores the joint moves 
(1, 1), • • • , (m, 1), i.e. ai exhausts opponent responses 
to x = 1 while ci '2 only samples 1 opponent response to each 
of it’s m possible moves. For the move choosing rule we 
apply the Bayes optimal rule used earlier: select the move x 
that has the highest expected g(x ) value when averaged over 
payoff functions consistent with the observed population. 

To start we determine the expected performance of an 
algorithm which does not have the bene£t of knowing any 
opponent responses. In this case we average the performance, 
g(x) for any element x, over all \Y\- L functions. We note that 
for any given player move x the \Y\- possible function values 
at the joint moves (x, •) are replicated \Y\$ /\Y\ l = 
times. The number of times that a g(x) value of 1 — i[\Y\ is 
attained in the £rst \Y\ l distinct values is (i + l) 1 — i 1 . Thus 
the average g(x) value, which we denote ( g ), is 

< 9 >= §( 1 -ir) nr<i> - 


where nj(i) = [(i + 1)/|Y|] ( - [i/\Y\\ l . This average is value 
is obtained for all moves x. In the continuum limit where 
\Y\ — > oo the expected value of g is simply 


<<?>=l/(l+7). 


This serves as a baseline for comparison; any algorithm which 
samples some opponent responses has to do better than this. 

Next we consider the algorithm a\ which exhaustively ex- 
plores all opponent responses to x = 1. There are |Y| ; possible 
populations that this algorithm might see. For each of these 
populations, d m , we need to determine <?(1) and the average g 
values for each of the other moves x ^ 1. This average is taken 
over the |Y| ! (I -1 ) functions consistent with d m . Of course 
we have g(l) = min d y m and the expected g(x) value for 
x ^ 1 are all equal to (g) above. Since the move choosing rule 
maximizes the expected value of g the expected performance 
of ai for this population is maxjmina!^, ( g )). Averaged over 
all functions (populations) the expected performance of ai is 


<5)t = -i]Tmajc(minc^,(p)) 
U I jV 


where the sum is over all \ Y\ l possible populations. Converting 
the sum over all populations into a sum over the minimum 


value of the population we £nd 

m - 1 / . v 

( 9 ) 1 = Y max ( 1 - ( 9 ) J n T (i) 

[\Y\(i-(g))\ , . . \y\-i 

= Y l 1 ~ Kn) ^ + ( 9 ) Y 

*=o v 1 17 i=rm(i-<9»i 


If we de£ne’ i g = [|Y|(1 — ( g })] then we obtain 


‘9 

i„-l 


i= 0 


= E ( 1 - jyj ) n T(0 + (ff>{ 1 - ( T#| 


\Y\ 


In the continuum limit we have 
1 + (9)1 


(9)1 = (1 - + <9)(1 - a - (9))‘) 


( 


1 + 1 


1 + 


— , l+i\ 


I + l 


where we have recalled the expected value (g) = 1/(1 + /). 
We note that as I — ► 00 the performance of algorithm ai is 
(1 + e _1 ) times that of (g). 

The analysis of algorithm a 2 is slightly more complex. In 
this case each element of the population occurs at a different 
x. For any given observed population the optimal move for 
the player is x* = arg max dm (i.e. the move corresponding 
to the largest payoff observed in the population). With this 
insight, we observe that when summing over all functions, 
there are |Y| i_1 possible completions to max d. y rn for the 
remaining 7—1 unobserved responses to x*. We must take 
the minimum over these possible completions to determine the 
expected value of g. Thus the expected payoff for algorithm 
a -2 when averaging over all functions (populations) is 


<<7>2 = -^]T min(maactC,l-|^y)nj_i(i). 

u 1 d v m i=0 1 1 

We proceed in the same fashion as above by de£ning 8 id = 
|Y|(1 — ma x.d v m ) (which depends on d v m ) so that 


r 1 \Y\~1 

(9)2 = — = maxdV mYl U l - lW + Y, 
1^1 dH, i — 0 i=id 

= j 7 jiE [ m “* (m) + Y 

1 1 as, Vl 17 «=* d v 


\Y\~i 

\Y\ 

\Y\-i 




\Y\ 




The sum over populations is now tackled by converting it to a 
sum over the |Fj possible values of ma xd^. The number of 
sequences of length 7 having maximum value j is j l — (j — 1) ; . 
Moreover, if maxcfjj, = j/\Y\ then id = |Y| — j and so 


(9)'. 


= E 

3 = 1 


J 

\Y\ 




\Y\i 


1 - 1 


+ 


|y| - 1 iyi_ 

E 


i=\Y\-j 


\Y\ 




8 There is no need to take the ceiling since id is automatically an integer. 



7 


The continuum limit in this case is found as 

{9)2=1 I dy-j [%(l-J/,)' _1 + (7-l) [ dy{l-y)y l ~ 2 

Ji-yi 
r 1 


'jf 

/ d Vj y](l- yj ) Ul + f dy 3 y 1 - 1 - [ dy 3 2 /^ 1 (l- % f eIlverted to inte S rals in * e 
7o Jo Jo by Monte Carlo. Details are 1 


= B(l + lj) + 1/7 -B(l,l) 


where B(x,y) is the beta function de£ned by B(x, y) = 
r(x)r(t/)/r(a: + y). For large l the Beta functions almost 
cancel and the expected performance for a 2 varies as l/l 
which is only slightly better than the performance of the 
algorithm which does not have access to any training data. 

In Figure 1 we plot the expected performance of ai and 
a 2 as a function of l (recall that m — l = 1). Algorithm ai 
outperforms algorithm a 2 on average for all values of l. 

Though <21 outperforms a .2 on average, it is interest- 
ing to determine the fraction of functions where a\ will 
perform no worse than a?. This fraction is given by 
m^'E/^perW) — perf 2 (/)) where perfi(/) is the per- 
formance of algorithm ai on payoff function /, perf 2 (/) is the 
performance of algorithm a 2 on the same /, and 0 is a step 
function de£ned as 0(x) = 1 if x > 0 and 9(x) = 0 otherwise. 
The Bayes optimal payoff for ai for any given payoff function 
/ is 9 

p eifl (/) = jmi'WC 1 ’ 5 ) if minx/(l,x) > ( g ) 

1 |mini/(2,x) otherwise 

Similarly, the performance of algorithm a 2 is given by 
perf 2 (/) — min /(x 2 ,x) where x 2 = arg maxd^ 3 . 

* X 

To determine the performance of the algorithms for any given 
/ we divide / into its relevant and irrelevant components as 
follows: 


respectively. 

1 - i In summing the above expressions over / we replace the 
j Vj sum over / with a sum over ji, j 2 , k\, k 2 , l and m using 
the appropriate multiplicities. The resulting sums are then 

continuum limit and evaluated 
by Monte Carlo. Details are presented in Appendix D. 

The results are shown in Figure 2 which plots the fraction 
of functions for which perfj > perf 2 . This plot was generated 
using 10 7 Monte Carlo samples per I value. 


fci 


j^Tj = /(!>!)> 


J 2 

\y\ 

k2 


1^1 =rmn{/(l,x)|x^l}, 
_l_ 

\y\ 


= f( 2 , 1 ) 

= min{/(2, x)|x ^ 1} 

X 

max{/(x,l)|x^ 1,2}, 


m 

W\ 


= min{/(x;,x)|x 2 ^ l,2,x ^ 1} 


In the de£nition of m, x 2 is the move chosen by a. 2 if a 2 
doesn’t choose move x 2 = 1 or 2, the speci£c value of x 2 is 
irrelevant Given these de£nitions, the performance of the two 
algorithms are 


Perfrf/) 


1 | minOi,^) if mm(j 1 ,k 1 ) > |F|(5) 
1^1 I min(j 2 , k 2 ) otherwise 


and 


l fminO'j.fci) if max(j 1 ,j 2 ,l)=j 1 
P erf 2 (/) = S min(j 2 , k 2 ) if ma x{j u j 2 , 0 = h 

[ min(Z, m) otherwise 

9 We have arbitrarily assumed that ai will select move 2 if it does not select 
more 1. This choice has no bearing on the result 


A. Better Training Algorithms 

In the previous sections we constructed Bayes optimal 
algorithms in limited settings by using specially constructed 
deterministic rules a and A. This alone is sufficient to demon- 
strate the availability of free lunches in self-play contexts. 
However, we can build on these insights to construct even 
better (and even worse) algorithms by also determining (at 
least partially) the Bayes-optimal search rule, (a, A), which 
builds out the training set, and selects the champion strategy. 
That analysis would parallel the approach taken in [MW98] 
used to study bandit problems, and would further increase the 
performance gap between the (best, worst) pair of algorithms. 

VII. The Role of Opponent “Intelligence” 

All results thus far have been driven by measuring perfor- 
mance based on g(x) = arg mi%/(x,x). This is a very 
pessimistic measure as it assumes that the agent’s opponent 
is omniscient and will make the move most detrimental to the 
agent. If the opponent is not omniscient and can not determine 
x* = axg min 5 /(x,x), how does this affect the availability 
of free lunches? 

Perhaps the simplest way to quantify the intelligence of the 
opponent is through the fraction, a, of payoff values known 
to the opponent The opponent will use these known values to 
estimate its optimal move x*. We have already examined the 
a = 1 limit corresponding to maximal intelligence where the 
opponent can always determine x* and we have free lunches. 
In the a = 0 limit the opponent can only make random replies 
so that the expected performance of the agent will be the 
average over the opponents possible responses. 

One way to approach this problem is to build the opponent’s 
bounded intelligence into the agent’s payoff function g and 
proceed as we did in the omniscient case. If |X| is the number 
of joint moves, then there are possible subsets of joint 

moves of size ojX |. 10 We indicate the list of possible subsets 
as S(X, a|2f|), and a particular subset by S z e S. For this 
particular subset x* is estimated by selecting the best response 
out of the Si payoff values known to the opponent. Of course, 
it may be the case that there are no samples in S l having the 
agent’s move x and in that case the opponent can only select 
a random response. In this case the agent will on average 
obtain the average payoff Ex /fe^)/Z(x). If we assume that 
all subsets of size a[X| are equally likely, then the agent’s 

10 We assume that a is an integral multiple of 1/|X|. 
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Fig. 1. Expected performance of algorithm ax (indicated as (<?}i) which exhaustively enumerates the opponents response to a particular move, and algorithm 
a ,2 (indicated as (< 7 ) 2 ) which samples only 1 opponent response to each move. For comparison we also plot ( g ) which is the expected performance of an 
algorithm which does no sampling of opponent responses. 



Fig. 2. The fraction of functions in the continuum limit where per > perf 2 The £g ure was generated with 10 7 Monte Carlo samples of the integrand for 
each value of l. 


payoff function against a boundedly intelligent opponent is 
given by 


9 a (x) = 


\X\ 

<*\X\ 


E 


5 4 65(X,q|JC|) 


arg min f(x, x). 

(x,x)eSi 


[ 5(1 = 1) g(x = 2)] values are [5/8 7/8] for a = 1/4, 
[29/48 41/48] T for a = 2/4, [9/16 13/16] T for a = 
3/4, and [l/2 3/4] T for a = 4/4. For this population, d 2 
£tness function (a, A^) continues to beat (a, A worst ) and by 
the same amount independent of a. 


This generalization reduces to the previously assumed g in the 
maximally intelligent a = 1 case. In Table II the functions 
g l / i y <? 2 / 4 . < 7 3 ^ 4 . and xce listed for the example of 
section IV. As expected the payoff to the agent increases 
with decreasing a (a less intelligent opponent). However, we 
also observe that for the same population d 2 the average 


VIII. Conclusions 

We have introduced a general framework for analyzing 
NFL issues in a variety of contexts. When applied to self- 
play we have proven the existence of pairs of algorithms one 
of which is superior to another for all possible joint payoff 
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f’f 

2222 

3 3 

iY {7 

2 !’f 4 

f’f 

2222 

;•! 
x > 7 

H 

7 ' A 

T 1 

fl 
222 

TT 

24’ A 
* 1 
r 

, 2 ’ z 

1,1 

1.1 

1.1 


TABLED 

Exhaustive enumeration of all 16 possible agent payoffs, g a (x = 1), g a (x = 2), for boundedly intelligent opponents having 

INTELLIGENCE PARAMETER a = 1/4, a = 2/4, a = 3/4, AND a = 4/4. See Table I FOR THE CORRESPONDING / FUNCTIONS AND FOR THE a = 1 g 
FUNCTION. The PAYOFF FUNCTIONS LABELED IN BOLD ARE THOSE CONSISTENT WITH THE POPULATION d 2 = {(1,2; 1/2), (2, 2; 1)}. 


functions /. This result stands in marked contrast to similar 
analyses for optimization in non-self-play settings. Basically, 
the result arises because under a minimax criteria the sum 
over all payoff functions / is not equivalent to a sum over 
all functions min* /(-,x). We have shown that for simple 
algorithms we can calculate expected performance over all 
possible payoff functions and in some cases determine the 
fraction of functions where one algorithm outperforms another. 
On the other hand, we have also shown that for the more 
general biological coevolutionary settings, where there is no 
sense of a “champion” like there is in self-play, NFL still 
applies 

Clearly we have only scratched the surface of an analysis of 
coevolutionary and self-play optimization. Many of the same 
questions posed in the traditional optimization setting can be 
asked in this more general setting. Such endeavors may be 
particularly rewarding at this time given the current interest in 
the use of game theory and self-play for multi-agent systems 
[PW02]. 
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Appendix 


A. Determination of Ttk.rid): distinct Y 

To determine itk,r{g) we £rst consider the case where all 
Y values are distinct and then consider the possibility of 
duplicate Y values. Though we only present the non-distinct 
case in the main text we derive the distinct Y case here because 
we can obtain a closed form expression for the probability and 
because it serves as simpler introduction to the case of non- 
distinct Y. 

To derive the result we generalize from a concrete example. 
Consider the case where |F| = 10, l(x) = 5, and k — 3. 
A particular instantiation is presented in Figure 3. In this 
case r = 4/10 which is not the true minimum for responses 
to x. The probability that r is the true minimum is simply 
k/l(x). If r is not the true minimum then P(g\d m ) is found 
as follows. P(g = 1/lOldm) is the fraction of functions 
containing Y values at {l/lOjUdm*- 11 Since the total number 
of possibilities consistent with the data is this fraction 

is (I^/& = (i(x) - *0/(1 Y\- k). Similarly, 

P(g = 2/10|d m ) is we know 

that the function can not contain a sample having £tness less 
than 2/10. 

Thus, in the general case, we have 




11 By we mean the set of Y values sampled at x. 
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y 

1/10 

2/10 

3/10 

4/10 

5/10 

6/10 

7/10 

8/10 

9/10 

10/10 

/($> ■) 



* 

* 


* 

* 


* 


dfn at x 




* 


* 



* 


P(g\dm) 

6/21 

5/21 

4/21 

6/21 

0 

0 

0 

0 

0 

0 


Fig. 3. Row 1 indicates the Y values obtainable on a particular payoff function / for each of the l(x) = 5 possible opponent responses. Row 2 gives the 
Y values actually observed during the training period. Row 3 gives the probabilities of g assuming a uniform probability density across the / which are 
consistent with d rn . The expected value of P(g\d m ) is 2.48/10. 


where a = \Y\ — k, b = l(x ) — k, 8(x) = 1 iff x > 0, and 
Sg t r — 1 iff g = r. Since it is easily verified that 


a-\Y\r + l 
b 


\Y\r-l 

+ E 

<?'=! 


a-g' 

6-1 


Of the 6 remaining Y values the probability that the mini- 
mum is g is 

fcl r (g) = \yr b {m - \y\§ + 1 ) 6 - m - re 4 }.- 


Ttk,. 


this probability is correctly normalized. The expected value of 
g is therefore 


Combining these results we obtain the final result 

7Tfc,r(<?) =0(r-p){(l -5+ j—y) -(l-5) b } + 


K(g\d m ) - 


\Y\ 


Ml 1 f r / fl -l y ! r + 1 V c n'l a ~ y H 

WJ n 6 i + § ?/U-iJ/Gi 


ir|r-l 




S r ,g 1 1 - r + 


\Y\i 


Given n r fc>r (p) the expectation value of g is found as 


Evaluating this sum we find 

i -l 


E (g\d m ) = r(l-r + 


E (g\d m ) = 


\Y\ 


a + 1 

6+1 


a + 1 — \Y\r 
6+1 


M 6 

\Y\) 


+ 


|y|-i ( a + i)S±i - (a + 1 - |Y»$±! 
6 + 1 a£ 


E 

9=V\Y\ 


E #-»^) -»-»*} 


- E MM 


r' = l/|y| 


\Y\) 


where the falling power, a-, is defined by a- = a(a — l)(a — 

2) ■ • • (a - 6+ 1). For the case at hand where |F | = 10, l(x) — , 

5, and k = 3 we have a = 7 and 6 = 2. Since r = 4/10 the where we have cance,,ed a BP™ pna * e terms in the ^lescopmg 

, . „ C. — X 171 «K fUpn run m/'t lldtP 


expected value is E(g\d m ) = -Y (8- — 4-) / (3 • 7-) = 52/21 
2.48/10. 


sum. If we define S k {n) = ^”=1 ** t * ien we can eva l uate the 
last sum to find 

E(g\d m ) = |rr 4 {5 6 (|E|) - #,(|Y| - |Y|r)}. 


B. Determination of it (<?)•' non-distinct Y 

In Figure 4 we present another example where l(x ) = 5, 
k = 3, and r = 4/10. In this case however, there are duplicate 
y values. The total number of functions consistent with the 
data is = \Y\ b . In this case it is easiest to begin 

the analysis with the case g = r. The number of functions 
having the minimum of the remaining 6 points equal to |y| 
is 1. Similarly, the number of functions having a minimum 
value of (|Y| — 1) is 2 b — 1. 2 6 counts the number of functions 
where the 6 function values can assume one of Y or Y — 1. 
The -1 accounts for the fact that 1 of these functions has a 
minimum value of Y and not Y — 1. Generally, the number 
of functions having a minimum value of r' is (|y| — |y|r' + 
1)* — (|y| — |y|r') 6 . All r' > r will result in the minimal 
observed value r so that the total number of functions having 
an observed minimum of r is 

m 

£[(m - \Y\r' + l) h - (\Y\ - |F|r') 6 ] = (|Y| - |Y|r + l) fc . 

r'—r 

Thus the probability of g = r is 

nkA g = r ) = \Y)- b (\Y\-\Y\r + l) b . 

We turn now to determining the probabilities where g < r. 


Though there is no closed form expression for S k (n) a 
recursive expansion of 5&(n) in terms of Sj(n) for j < k 
is 

The recursion is based upon So(n) = n. 

In the concrete case above where |y| — 10, r = 4/10, and 
6 = 2 the expected value is ^294/100 = 2.94/10. 


C. Continuum Limit 

In the limit where \Y\ — * 00 we can approximate the 
expectation E(g\d m ) given by the sum 


r-i/m 


E(g\d m )= E (l-(r'-l/m)) = E ~ ^ 

r'=l/|y| r'=0 

by the integral 

E(g\d m )= f dr ' (l-r') b =^ ri {l-(l-r) b + 1 }. 

Jo (7) 

The prediction made by this approximation at \Y\ = 10, 
r = 4, and 6 = 2 is 2.61/10 as opposed to the correct 
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Fig. 4. Row 1 indicates the Y values obtainable on a particular payoff function f for each of the l(x) = 5 possible opponent responses. Row 2 gives the 
Y values actually observed during the training period. Row 3 gives the probabilities of g assuming a uniform probability density across the / which are 
consistent with dm- Note that unlike Fig. 3 there are some duplicate Y values. The expected value of P(g\dm) is 2.94/10. 
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(|y| ~ki + 1)*- 1 - (ivi - h ) 1 - 1 

^2 

(|y| - *a + 1 ) 1 - 1 - (|y| - fca) 1 - 1 

i 

l l ~ 2 - ( l - 1) 1 - 2 

m 

(|y| - m + I) 1-1 - (\ Y \ - m ) 1 - 1 


TABLE m 


Multiplicities occurring when converting the sum over / to a 

SUM OVER THE ALLOWED VALUES OF ji,j 2 , kl , *2,1, AND m . 


from U(0, 1) and transforming so that u = 1 — v l ^ m \ samples 
from P(w) = mti)" 1 ' 1 are obtained via w — 


result of 2.94/10. However, had |F| = 1000 and r = 400 the 
accurate result is 261.65/1000 while the approximation gives 
261.3/1000. 


D. Performance Comparison 

In this appendix we evaluate the fraction of functions for 
which a! performs better or equal to algorithm a 2 where aj 
and a 2 are de£ned as in Section VI. 

The function 0(perf 1 (/) - perf 2 (/)) is equal to 1 if 

cd\ + e\cd 2 + e 2 cd\d 2 + e\cd\ + cd 2 + e 2 cd\d 2 

where c = > |F|{p», di = (max(ji,/ 2 ,0 = 

ji), d 2 = (ma x(ji,j 2 ,l) = j 2 ), ex = (minO'i.fci) > 
min (j 2 ,fc 2 )), e 2 = (min(ji,fci) > min e 3 = 
(min(j 2 , k 2 ) > rninf/, m)). In the above Boolean expression 
we have used the condensed notation ab = aAb, a + b = aVb, 
and a = -<a. It is convenient to factor the Boolean expression 
as 


c(di + ej d 2 + e 2 d\d 2 ) + c(eidi + d 2 + C 3 <fi<f 2 ) . 


To give the fraction of functions where ai performs better 
than a 2 this expression is to be summed over ji, j 2 , ki, k 2 , 
/, and m with appropriate multiplicities. The multiplicities are 
given in Table in. 

In the continuum limit this sum becomes the integral 

[ dji f dj 2 [ dkx P(ki) [ dk 2 P(k 2 ) f dlP(l)x 
Jo Jo Jo Jo Jo 

/ dmP(m){c(di+eid 2 +e 2 5id 2 )+c(eidi+d 2 +e 3 di5 2 )} 
Jo 


where P(ki) = _(I-l)(l-ifc 1 )'- 2 , P(k 2 ) = ( 7 — 1)(1 — fc 2 ) 7 - 2 , 
P(l) = (J - 2)l l ~ 3 , P{m ) — (l — 1)(1 — m ) l ~ 2 , and condition 
c is modifed to min(ji, k\) > ( g ). Though this integral is dif- 
ficult to evaluate analytically, it is straightforward to evaluate 
by Monte Carlo importance sampling of (Ji,j 2 ,ki,k 2 ,l,m) 
using the respective probability distributions. Samples from 
P(u) = m( 1 - u) m_1 are obtained by sampling values v 






